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System for processing data and method thereof 



The invention relates to a system for processing data, the system comprising a 
first source having first data, a second somce having second data, and a server. The invention 
fiirdier relates to a method of processing data and a server for processing data. 

An information system comprising a plurality of user devices for storing user 
S data expressing user preferences to media content, purchases, etc. is known. Such an 
information system typically con4)rises a server collecting the user data. The user data is 
analyzed for determining correlations hetween the user data, and providing a particular 
service to one or more users. For exaiiq;>le, a collaborative filtering technique is a method for 
content reconamendation that combines interests of a large group of users. 
10 Memory-based collaborative filtering techniques are based on determining 

correlations (similarities) between different users, for which ratings of each user are 
compared to the ratings of each other user. These similarities are used to predict how much a 
particular user will like a particular piece of content For the prediction step, various 
alt^atives exist. Apart from determining tiie similarities between users, one may determine 
1 5 similarities between items, based on rating patterns received firom the users. 

A problem in this context is the protection of the privacy of the users, who 
don't want to reveal their interests to a server or to other users. 

It is an object of the present invention to obviate the drawbacks of the prior art 
system, and provide a system for processing data, where the user privacy is protected. 
20 This object is realized in that the system comprises 

a first source for encrypting first data, and a second source for encrypting 

second data, 

a server configured to obtain the encrypted Gist and second data, the server 
being precluded firom decrypting the encrypted first and second data, and firom revealing 
25 identities of the first and second sources to each other, 

computation means for performing a computation on the encrypted first and 
second data to obtain a similarity value between the first and second data so that tiie first and 
second data is anonymous to the second and first sources respectively, the similarity value 
providing an indication of a similarity between the first and second data. 
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In one embodimeat of fhe present invention^ the similarity value is obtained 
using a Pearson correlation or a Kappa statistic. In anotiier embodiment, the computation 
means is realized using a PailUer ciyptosystem, or a threshold Paillier cryptosystem using a 
public key-sharing scheme. 
5 The computational steps required for determining the similarity value 

comprise a calculation of^ for example, vector imier products and sums of shares. After the 
computation, encryption techniques are allied to the data to protect them. In a sense, this 
means that only encrypted information is sent to the server, and all computations are done in 
the encrypted domain. 

10 In a further embodiment of the present invention, the first or second data 

conq>rises a user proJBle of a first or second user respectively, the user profile indicating user 

preferences of the first or second usct to media content items. In another example, the first or 

second data comprises user ratings of respective content items. 

An advantage of the invention is that user information is protected. The 
15 invention can be used in various kinds of recommendation services, such as music or TV 

show recommendation, but also medical or financial recommendation applications where the 

privacy protection may be very important 

The objection of the invention is also realized in that the method of processing 

data comprises steps of enabling to 
20 - encrypt first data for a first source, and encrypt second data for a second 

source, 

provide the encrypted first and second data to a server that is precluded firom 
decrypting the encrypted first and second data, and from revealing identities of the first and 
second sources to each other, 

25 - perform a computation on the encrypted first and second data to obtain a 

similarity value between the first and second data so that the first and second data is 
anonymous to the second and first sources respectively, the similarity value providing an 
indication of a similarity between the first and second data. 

The method describes the operation of the system of the present invention. 

30 In one embodiment, the method further comprises a step of using the similarity 

value to obtain a recommendation of a content item for the first or second source. For 
example, suppose we want to predict the score of an item i for active user a: 
1 . First, we compute the correlation between user a and every other user x. This 

is done by computing inner products between the rating vector of user a and each other user 
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X, fbiou^ an exchange via flie server la this way, user a knows the correlation value with 
each oflier user Xr=l A- A but he does not know who user 1 A-,n is. On the other hand, the 
server knows who user 1 A—tn is, but he doesnt know flie correlation values. 
2, Next, we compute a prediction for item i for user a by taking a kind of 

5 weighted average of the ratings of user i;2,..,,n for this item, where the weigjits are given by 
the correlation values. The procedure for this is that user a encrypts the correlation values and 
sends them to the server, ^o forwards them to the respective users l,2,...,n. Each user 
x==l A-.,n multiplies the encrypted correlation value he receives with the rating he gave for 
item i, and sends the result back to the server. The server, still not able to decrypt anything at 

10 all, then combines the encrypted products of the users l,2,...,n mto an encrypted sum, and 
sends this end result back to user a, who can decrypt it to get the desired result 

Claim 6 describes the operation of the s^tem including the first and second 
sources, and the server. Claim 12 is directed to the operation of the server ensuring the user 
privacy and enabling the con^iutation of the shnilarity value in the encrypted domain. Both 

IS claims are interrelated and directed to essentially the same invention. 



These and other aspects of the invention will be further explained and 
described with reference to the following drawings: 

Figure 1 is a functional block diagram of an embodiment of a system 
according to the present invention; 

Figure 2 is an embodiment of the method of the present invention. 



25 According to an embodiment of the present invention, a system 100 is shown 

in Figure 1. The system comprises a first device 1 10 (a first source), and a plurality of second 
devices 190, 191 . 199 (second sources). A server 150 is coupled to the first device and the 
second devices. The first device has first data, for example, user ratings of media content, or 
user preference data with respect to goods on sale, or medical records of a user indicating a 

30 prescription to give preference for certain food products, etc. The second device has second 
data, for example, the second data relate to preferences of a second user. 

In one example, the first device is a TV set-top box arranged to store user 
ratings for TV programs. The first device is further arranged to obtain EPG data (Electronic 
Programme Guide) indicating, e.g., a broadcast time, a channel, a title, etc. of a respective 
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TV program. The first device is ananged to store a user profile storing user ratings for 
respective TV programs. The user profile may not comprise ratings for aU programs in the 
BPG data. To determine whether a user will like a particular program which the user did not 
rate, various recommendation tedmiques can be used. For example, collaborative filtering 
5 techniques are used. Then, the first device collaborates with the second device storing the 
second data comprising a second user profile to find out whether ttie second profile is similar 
(using a shnilarity vatoe) to the first profile and includes a rating of the particular program. If 
the similarity value between the firat and second profiles is higher than a predetermined 
threshold, the rating included in the second profile is used to deteimme whether a user of the 

10 first device would like that particular program or not (a prediction step). 

For instance, a k^a statistic or Pearson correlation may be used for 
determming the similarity measure between the first and second profiles. 

The similarity may be a distance between two profiles, the correlation or a 
measure of the numbw of equal votes between two profiles. For ihe calculation of 

15 predictions, it is necessary that the similarities are high if users have the same taste, and low 
if fliey have an opposite taste. For exan^le, the distance calculates the total difference in 
votes between die users. The distance is zero if the users have exactly the same taste. The 
distance is high if the users behave totally opposite. Therefore we have to do an adjustment 
sudi that the weights are higji if the users vote fee same. A simple distance measure is the 

20 known Manhattan distance. 

one example, if the second profile is sufficiently similar to the first profile 
(based on the similarity value), all content items (TV programs) not rated in the first profile 
but in the second profile are found. Said items are recommended to a user associated with the 
first profile. The recommendation may be based on the ratings of the items in the second 

25 profile, prediction methods for calculating predicted ratings of the items for the user of the 
first profile on the basis of the similarity value between the first and second profile, etc. 

It should be noted that the similarity value can be used not only in the context 
of the collaborative filtering techniques (in the content recommendation field) but, generally, 
for a personalization of media content, a targeted advertising of users, matchmaking services, 

30 and other applications. 

A problem of a user privacy arises because, in the prior art systems, the 
calculation of the similarity value requires that the first data of the first device and/or the 
second data of the second device are communicated to the second device and the first device 
respectively or the server. 
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The first device encrypts the first data, and the second device encrypts the 
second data. The first and second data are sent to the server. The server is not enable of 
decrypting the encrypted first and second data. Further, the server ensures that when the 
second device obtains the encrypted first data, the second device does not identify an idmtity 
5 of Ae first device. In turn, the first device cannot identify that the encrypted second data 
origmate fix)m tiie second device when the first device receives the second data. Thus, the 
server is precluded fix>m decrypting the enoypted first and second data, and firom revealing 
identities of the first and second sources to each other. 

For example, the server stores a database conqirising a first idoitifier of the 

10 first device and a second identifier offke second device. When the first device transmits the 
encrypted first data to the second device via the server, the server strips away the first 
identifier attached to the encrypted first data, and the server delivers only the encrypted first 
data without the fibrst identifier to the second device. 

It should be noted that the computation on ^e encrypted first and second data 

15 may be performed in a number of altecoatave manners. For example, the first device encrypts 
the first data and sends the encrypted first data to the second device via the server. The 
second device calculates encrypted inner products between the first encrypted data and the 
second data. The second device sends the encrypted inner vector to the first device via the 
server. The first device decrypts the encrypted iimer products, and calculates the similarity 

20 value between the first and second data. The first device obtains the similarity but the first 
device cannot identify the source of the second data. 

Altematively, the computations are performed completely on the server that 
has obtained the encrypted first data and the encrypted second data. In a further altemative, 
the computations are performed partly on the server and partly by the second device. The first 

25 device only decrypts the inner product and obtains the similarity value. Other alternatives can 
be derived. 

Figure 2 shows an embodiment of the method of the present invention. In step 
210, first data for a first source are encrypted, and second data for a second source are 
encrypted. In step 220, the encrypted first and second data are provided to a server 150. The 
30 server is precluded fix)m decrypting the encrypted first and second data, and fix)m revealing 
identities of the first and second sources to each other. In step 230, a computation is 
perfomied on the encrypted first and second data to obtain a similarity value between the first 
and second data so that the first and second data is anonymous to the second and first sources 
respectively. The similarity value provides an indication of a similarity between flie first and 
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second data Optionally, in step 240 the similarity value is used to obtain a recommendation 
of a content item for the first or second source. Further embodiments of the steps 210, 220, 
230 and 240 are discussed in detail in tibe next paragraphs. 

Methods exist for the following two problems: 
5 1. Given two parties that each have a secret vector of int^ers, determine the 

imier product between the vectors without any of the parties having to reveal ttie specific 
information. 

2. Given a set of parties that each have a secret number, determine the sum of the 

numbm without any of the parties having to reveal the number. 
10 The first problem is solved, for example, by the Paillier cryptosystem. The 

second pi!oblem is handled by usmg a key-sharing scheme (also Paillier), where decryption 
can only be done if a sufScient number of parties cooperate (and then only the sum is 
revealed, no detailed information). 

15 Memory-based collaborative filtering 

Most memory-based collaborative filtering approaches woric by first 
detecmining similarities between users, by comparing their jointly rated items. Next, these 
similarities are used to predict the rating of a user for a particular item, by interpolating 
between the ratings of the other users for this item. Typically, all computations are performed 

20 by the server, iq>on a user request for a recommendation. 

Next to the above approach, which is called a user-based approach, one can 
also follow an item-based approach. Then, first similarities are determined between items, by 
comparing tiie ratmgs they have gotten firom the various users, and next the rating of a user 
for an item is predicted by interpolating between the ratings that this user has given for the 

25 other items. 

Before discussing the formulas underlying both approaches, we first introduce 
some notation. We assume a set U of users and a set / of items. Whether a user u e C/has 
rated item i e /is indicated by a boolean variable hui which equals one if the user has done 
so and zero otherwise. In tiie former case, also a rating rui is given, e.g. on a scale fix)m 1 to 
30 5. The set of users that have rated an item / is denoted by C/i, and the set of items rated by a 
user u is denoted by lu. 

The user-based approach 
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User-based algorifhms are widely used collaborative filtering algorithms. As 
described above, there are two main steps: determining similarities and calculating 



predictions. For hoih we discuss commonly used fomiulas, of which we show later that they 
all can be conoputed on encrypted data. 

5 

Similarity measures 

Many similarity measures have been presented in the litosture, for example, 
correlation measures, distance measures, and counting measures. 
The well-known Pearson correlation coefELdent is given by 



where ru denotes the average rating of user m for the items he has rated The numerator in this 
equation gets a positive contribution for each item that is eittier rated above average by both 
users u and v, or rated below average by both. If one user has rated an item above average 
and the olber user below average, we get a negative contribution. The denominator in the 
15 equation normalizes flie similarity, to &11 in the interval [-1;1], where a value 1 indicates 
complete correspondence and— 1 indicates completely opposite tastes. 



rating (e.g. 3 if using a scale from 1 to 5) or by zero. In the latter case, the measure is called 
vector dmilarity or cosine, and if all ratings are non-negative, the resulting similarity value 
20 will tiien lie between 0 and 1. 

Distance measures 

Another type of measures is given by distances between two users' ratings, 
such as the mean-square difference given by 



10 



0) 



Related similarity measures are obtained by replacing ru in (1) by the middle 



25 




(2) 



or tibie normalized Manhattan distance given by 




(3) 



|r«nJv| 
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Such a distance is zero if the users rated their overlapping items identically, and larger 
otherwise. A simple transformation converts a distance into a measure that is higji if users' 
ratings are similar and low otherwise. 

5 Counting measures 

Counting measures are based on counting the number of items fliat two users 
rated (nearly) identically. A simple counting measure is the majority voting measure given by 

s{u,v)^(2-if^i^, (4) 

where ° < ^ < = 1^* ^ ^« I ''^^ ^ 'Vi}! ^he nmnber of items rated 'the 
10 same' by u and v, and = nip| — Cyp gjygg number of items rated 'differently'. 

The relation « may here be defined as exact equality, but also nearly matching ratings may be 

considered sufficiently equal. 

Another counting measure is given by the weighted kappa statistic [5], which 

is defined as the ratio between the observed agreement between two users and the maximum 
IS possible agreement, where both are corrected for agreement by chance. 

Prediction formulas 

The second step in collaborative filtering is to use the similarities to compute a 
prediction for a certain user-item pair. Also for this step several variants exist. For 
all formulas^ ^e assume that there are users that have rated the given item; otherwise 
no prediction can be made. 

Weighted sums. The first prediction formula we show is given by 

So, the prediction is the average rating of user w plus a weighted sum of deviations 
from the averages. In this sum, all users are considered that have rated item <. Alter- 
natively, one may restrict them to users that also have a sufificiently high similarity to 
user u, i.e., -we sum over all users in Uf (f) = {v G 17^ | v) > t} for some threshold 
t. 

An alternative^ somewhat simpler prediction formula is given by 

Note that if all ratings are positive, then this formula only makes sense if all sim- 
ilarity values are non-negative, -which may be realized by choosing a non-negative 
threshold. 
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Maximum total Bhnilarity. A second tjpe of prediction formula is given by 
choosing the rating that maximizes a kind of total similarity as is done in the ma- 
jority voting approach^ given by 

/*frf = aig majUjT X 4^^)^ (7) 

where DJ^ = {v G J7| | r-w « jc} is the set of users that gave item i a rating similar 
to value X. Again» the relation « may be defined as exact equality^ but also neaxly- 
matching ratings may be allowed. Also in this fonnula one may use l/|(f ) instead of 
Ui to restrict oneself to sufficientiy similar users. 

Time complexity 

The time connplexity of user-based collaborative filtering is 0{m^n)^ where m = 
|17| is the number of users and « = is the nurriber of items^ as can be seen as 
follows. For the first step^ a similarity has to be corrputed between each pair of 
users (P(m^)), each of which requires a run over all items If for all users all 

items witii a missing ratir^ are to be given a prediction^ then tiiis requires 0(mn) 
predictions to be con^uted^ each of which requires sums dPO^m) terms. 



The item-based approach 

Item-based algorithms first compute similarities between items, e.g. by using a 
similarity measure 

Note that the exchange of users and iteirs as compared to (1) is not complete, as still 
the average rating r« is subtracted fiom the ratings* The reason to do so is that this 
subtraction corr^ensates for the fact that some users give higher ratings than others, 
and there is no need for such a correction for items. 

The standard item-based prediction formula to be used for the second stq> is 
given by 

The other similarity measures and prediction formulas we presented for the user- 
based ^proach can in principle also be turned into item-based variants, but we will 
not show them here. 

Also in the time complexity for item-based collaborative filtering the roles of 
users and items interchange as coir5)ared to the user-based approach, as expected, 
Hence, fee time connplexity is given by 0(m}p') instead of 0(fl^ny If the num- 
ber m of users is much lai^er than the number n of items, the time complexity of 
the item-based approach is favorable over that of user-based collaborative filtering 
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Another advantage in this case is that the sunilarities are generally based on more elements, 
which gives more reliable measures. A further advantage of item-based collaborative filtering 
is that correlations between items may be more stable than correlations between users. 

Encryption 

Jn the next sections we show how the presented fommlas for collaborative 
filtering can be conq>uted on encrypted rating. Before doing so, we present the encryption 
system we use, and tiie specific properties it possesses that allow for tiie computation on 
encrypted data. 



A public-key cryptosystem 

The oryptosystem we use is the public-key cryptosystem presCTted by Paillier. 
We briefly describe how data is encrypted. 

First, encryption keys are generated. To this end, two large primes p and q are 

15 chosen randomly, and we compute n = pq and X = lcm(p-l;q-l). Furthermore, a generator g is 
conqputed fix>m p and q (for details, see P JPaillier. Public-key cryptosystems based on 
composite degree r^duosity classes. Advances in Oryptology-EUROC3RYPT'99, Lecture 
Notes m Computer Science, 1592:223-23 8, 1999)- Now, the pair (n;g) forms the public key 
of the cryptosystem, which is seat to everyone, and X forms the private key, to be used for 

20 decryption, which is kept secret. 

Next, a sender who wants to send a message /w e Sj, = {O, — 1} to a 
receiver with public key computes a oiphertext e(m) by 

e(iw)=g^f^mod;«^, (10) 

where r is a nurrtoer randomly drawn fromZa =ixGX\Q<x<nA gpd(x,K) = 1}. 
This r prevents decryption by sirrply encrypting all possible values of ;w (in case it 
can only assume a few values) and comparing the end result. The Paillier system is 
hence called ar^^<9/»te^^encxyption system. 

Decryption of a dphertesrt e= e(w) is done by cornputing 

L(g* mod 7?) ' 

where Z.(x) = (x— for any 0 < x < «^ with x = 1 (mod n). During decryption, 
the random number r cancels out. 
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Note that in the above dyptosystem the messages m are integers. However, 
rational values are possible by multiplying them by a sufficiently large number and rounding 
off. For instance, if we want to use messages with two decimals, we simply multiply them by 
100 and round off. Usually, the range Zn is large enough to allow for this multiplication. 

Properties 

The above presented encryption scheme has the following nice properties. The first 
one is that 

B(mi)E(m) S8^7fg^f5=g(-^+'^)(ri72r S6(/«i+;k2) (mod n% 
which allows us to compute sums on enoiypted data.. Secondly, 

dimi)^ s ^^^r^y^ = ^i^(r^)^ « e(ff2i^) (mod «^), 

which aDows us to corn)Ute products on encrypted data. An enoiyption scheme 
wiBi these two properties is called a homomorphic encryption scheme. The Paillier 
system is one homomorphic enoiyption scheme, but more ones e?dst. 

We can use the above properties to calculate sums of products^ as required for 
the similairfy measures and predictions^ using 

W<^jf' ^Yi<^jh) = ^^^M (niodw^). (11) 

J J J 

So, using this, two users a and b can con^wite an inner product between a vector 
of each of bem in the following way. User a first encrypts his entries aj and sends 
them to &. User b then computes (1 1 )> as given by the left-hand term, and sends the 
result back to a. User a next decrypts the result to get the desired inner product. 
Note that neither user a nor user 6 can observe the data of the other user; the only 
thing user a gets to know is the inner product. 
A final property we want to mention is that 

6(^1)8(0) =g^Vr5=g^i(ri/ir = e(/wi) (mod«^). 

This action, which is called (re)bltnding^ can be used also to avoid a trial-and-error 
attack as discussed above, by means of the random number ri G Zr. We will use this fiy ther 

on. 

Encrypted user-based algoriflim 

It is fiirfher explained how user-based collaborative jSltering can be performed 
on encrypted data, in order to compute a prediction ^td for a certain user u and item i. We 
consider a setiq> as depicted in Figure 1, where fiie first device 1 10 (user u) conmiunicates 
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with the second devices 190, 191, 199 (other users v) through the server 150. Furthermore, 
each user has generated his own key, and has pxiblished the public part of it As we want to 
compute a prediction for user u, the steps below will use the keys of u. 



5 



Computing similarities on encrypted data 

First we take the similarity conq>utation step, for which we start with the 
Pearson correlation given in (1). Although we akeady explained how to compute 
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an inner product on encrypted dats^ we have to resolve the problem that fte iterator 
t in file sums in (1 ) only runs over^n/v, and this intersection is not known to either 
user. Therefore^ we first introduce 



={«■ 



^ta'-y'u if frtt<= l>i.e.^ user u rated item i 



otherwise. 



and rewrite (1) into 

The idea that we used is that any i t^IoCIIv does not contribute to any of the three 
sums because at least one of the factors in the corresponding term will be zero, 
Hence^ we have rewritten the similarity into a form consisting of three inner prod- 
ucts, each between a vector ofu and one of v. 

The protocol now mns as follows. Firsts user zi calculates encrypted entries 
and B{bui) for aU i G/, using (10), and sends them to fee server. The 
server forwards these encrypted entries to each other user vi,... j,v,„j . Next, each 
userv;,^= 1, computes e(2,^sr«igVyi)>e(5:i^5^2y,,), and B(;£^j^q^^,b^), 

using (1 IX and sends these three results back to the server, which forwards them to 
user M. User u can decrypt the total of 3(^1 — 1) results and compute the similarities 
•^(^"K/)* for all y = 1 1 , Note that user u now knows similarity values with 
theotherm— 1 users, but he need not know who each user ^ = 1,...,zk— 1 is. The 
server, on the other hand, knows who each user 7= 1 is, but it does not 

know the similarity values. 

For the other similarity measures, we can also derive computation schemes us- 
ir^ encrypted data only. For the mean-square distance, we can rewrite (2) into 

where we additionally define r^^ = 0 if bui = 0 in order to have well-defined values. 
So, this distance measure can also be coinputed by means of four inner products. 

The computation of normalized Manhattan distances is somewhat more compli- 
cated. Assuming tiie set of possible ratings to be given by .X; we first define for each 
ratings EiT^ 

otherwise, 

and 

x\ if6tti = l, 
otiierwise. 



^^"-\0o 
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Now^ (3) can be rewritten into 

So, the normalized Manhattan distance can be computed from jAf] + 1 inner prod- 
ucts. Furfliermore, a user v can compute Ux^B(Jjjb^(^) = e(Sxjr2<^2^iaS)* 
and send this result^ together "with the encrypted denominator^ back to user m. 

The majority-voting measure can also be computed in the above way, by defin- 

fl if6«< = l Ar^nPrfx, 
^~\0 otherwise. 

Then, Cuv used in (4) is gjlven by 

which can agp.in be computed in a way as described above. Furthermore, 

Finally, -we consider the weighted kajpa measure. A^in, o«v can be computed by 
definii^ 

\0 otherwise, 

and tiien calculating 

Furthermore, euv can be coxrputed in an encrypted -way if user u encrypts Puix) for 
all X E X and sends them to each ottier user v, who can then compute 

and send this back to u for decryption. 



Computing predictions on enciypted data 
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For the second step of collaborative filterings user « can calculate a piediction for 
item* in the following way. Firsts we rewrite the quotient in (5) into 

So» first user w encrypts iy(w,v/) and |5(if,v^)| for each other user vj, J = 
1,.--5W — 1^ and sends them to the server. The server then forwards each pair 
e(^(w,v;)),8(|5(w,v;)|) to the respective user v^, who computes e(s(w,v/))^^>^e(0) = 

^(s{u,vj)qyji) and e(Kw,v^)|) V8(0) = e(|5(tt,Vy)|iVyi)> where he uses reblinding 
to prevent the server from getting knowledge from the data ^ing back and forth to 
user v; by trying a few possible values. Each user Vj next sends the results back to 
the server^ which then confutes 

and 

and sends these results back to user u. User u can then deoiypt ttiese messages and 
use them to compute the prediction. The simple prediction formula of (6) can be 
handled in a similar way. 

The maximum total similarity prediction as given by (7) <»n be handled as fol- 
lows. Firs^ we rewrite 



where is as defined by (12), Nex^ user u encrypts s(UyVj) for each other 
user Vjf^ J = 1^ and sends them to the server. The server then for- 

wards each z(s(u^vj)) to the respective user v/, who conputes B(s(u,vj))^j^e(0) s 
e(s{u,,vj)a!^^l)^ for each ratingxe Jr> using reblinding. Next^ each user sends these 
\X\ results back to the server, which then computes 



for each x e -X^ si^d sends the |^| results to user Finally, user u decrypts these 
results and determines the rating x that has the hig^iest result 



Encrypted item-based algorithm 
S Also item-based collaborative filtering can be done on encrypted data, using 

the threshold system of the Paillier cryptosystem. In such a system the decryption key is 
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shared among a number 1 of uscts, and a ciphertext can only be decrypted if more than a 
threshold t of users cooperate. In this system, the gaieration of the keys is somewhat more 
complicated, as well as the decryption mechanism. For the decryption procedure in the 
threshold cryptosystem, first a subset of at least t+1 users is chosen that will be involved in 
5 the decryption. Next, each of tiliese users receives the ciphertext and computes a decryption 
share, using his own share of the key. Finally, these decryption shares ate combined to 
compute the original message. As long as at least t + 1 users have combined tiieir decryption 
share, the original message can be reconstructed. 

The general working of the item-based spproach is sli^tly different than the 

1 0 user-based approach, as first the server determines similarities between items, and next uses 
them to make predictions. 

Compared to the known set-xsp of collaborative filtering, the embodiment of 
the implementation of the collaborative filtering, according to the present invention, requires 
a more active role of the devices 1 10, 190, 191, 199. This means that instead of a (single) 

1 5 server that runs an algorithm in the prior art, we now have a syst^ running a distributed 
algorithm, where all the nodes are actively involved in parts of Ihe algoritibm. The time 
complexity of the algorithm basically stays the same, except for an additional factor pC\ for 
some similarity measures and prediction formulas, and the fact that the new set-up allows for 
parallel computations. 

20 Various computer program products may implement the functions of the 

device and method of the present invention and may be combiaed in several ways with the 
hardware or located in different other devices. 

Variations and modifications of the described embodiment are possible within 
the scope of the inventive concept. For example, the server 150 in Figure 1 may comprise the 

25 computation means to obtain an encrypted inner product between the first data and the 

second data, or encrypted sums of shares of the first and second data in the similarity value, 
and the server is coupled to a public-key decryption server for decrypting the encrypted inner 
product or the sums of shares and obtaining the similarity value. As another example, the 
general concept of the invention can be msqpped in a variety of manners onto the value chain, 

30 i.e., on the business models of the interlinked commercial activities by different legal entities 
that in the end enable to provide a service to the consumer. An embodhnent of the invention 
involves ©nabliog a consumer to supply encrypted data and an identifier, representative of the 
consumer via a data network, e.g., the Ihtemet. The relationship between the identifiers and 
the encrypted data of various consumers is broken in order to provide privacy. For example, a 



wo 2005/015462 



17 



PCT/IB2004/051399 



s^er substitutes another (e.g., temporary or session-related) identifier before passing on the 
enorypted data. The encrypted data of a consumer is Hhsacx processed in the encrypted domain 
to calculate similarity values, either at a dedicated server or at ano&er consumer, both being 
unable to decrypt the encrypted data. 

The use of &e verb *to comprise' and its conjugatioias does not exclude the 
presence of elonents or steps ofher than those defined in a claim. The invention can be 
implemented by means of hardware comprising several distinct elements, and by means of a 
suitably programmed computer. In the system claim enumerating several means, several of 
these means can be embodied by one and the same item of hardware. 

A ^computer program' is to be understood to mean any software product 
stored on a computer-readable medium, such as a floppy-disk, downloadable via a network, 
such as the Internet, or marketable in any ofher manner. 



